xen.git
14 years agohvmloader: enable PCI_COMMAND_IO on primary VGA device
Ian Campbell [Wed, 1 Jun 2011 15:40:54 +0000 (16:40 +0100)]
hvmloader: enable PCI_COMMAND_IO on primary VGA device

There is an implicit assumption in the PCI spec that the primary VGA
device (e.g. something with class==VGA) will have I/O enabled in order
to make the standard VGA I/O registers (e.g. at 0x3xx) available, even
though the device has no explicit I/O BARS.

The qemu device model for the Cirrus VGA card does not actually
enforce this but SeaBIOS looks for a VGA device with I/O enabled
before running the VGA ROM. Coreboot has similar behaviour and I
verified on a physical Cirrus GD 5446 that the BIOS had enable I/O
cycles.

The thread at
http://www.seabios.org/pipermail/seabios/2011-May/001804.html
contains more info.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agohvmloader: allow per-BIOS decision on loading option ROMS
Ian Campbell [Wed, 1 Jun 2011 15:39:55 +0000 (16:39 +0100)]
hvmloader: allow per-BIOS decision on loading option ROMS

SeaBIOS has functionality to load ROMs from the PCI device directly,
it makes sense to use this when it is available.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
14 years agoxsm: fixed missing header compile error
Machon Gregory [Wed, 1 Jun 2011 15:12:29 +0000 (16:12 +0100)]
xsm: fixed missing header compile error

Fixes compile error caused by changeset 23363 by including xenoprof.h
header.

Signed-off-by: Machon Gregory <mbgrego@tycho.ncsc.mil>
14 years agox86: Fix spurious_page_fault() for 1GB superpages.
Keir Fraser [Tue, 31 May 2011 12:57:45 +0000 (13:57 +0100)]
x86: Fix spurious_page_fault() for 1GB superpages.

From: Xin Li <xin.li@intel.com>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agonestedsvm: fix tlb_control
Christoph Egger [Tue, 31 May 2011 12:55:50 +0000 (13:55 +0100)]
nestedsvm: fix tlb_control

On VMRUN emulation evaluate the virtual tlb_control only to match
hw behaviour. Deal with l1 guests which use flush-by-asid w/o
checking cpuid bits or fill tlb_control with random data.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
14 years agox86: cpufreq init cleanup
Liu, Jinsong [Tue, 31 May 2011 12:53:54 +0000 (13:53 +0100)]
x86: cpufreq init cleanup

c/s 20325 change AMD cpufreq init logic.  Before that, AMD cpu start
cpufreq init logic only when all cpus ready.  c/s 20325 change it to
per cpu add, however, leave code un-elegant.

This patch do a little cleanup work.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
14 years agonestedhvm: Fix wrong memory size of nested shadow_io_bitmap
Keir Fraser [Tue, 31 May 2011 12:52:42 +0000 (13:52 +0100)]
nestedhvm: Fix wrong memory size of nested shadow_io_bitmap
Signed-off-by: Eddie Dong <eddie.dong@intel.com>
While there, simplify and tidy the code.
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoHVM/SVM: enable tsc scaling ratio for SVM
Wei Huang [Sat, 28 May 2011 07:58:08 +0000 (08:58 +0100)]
HVM/SVM: enable tsc scaling ratio for SVM

Future AMD CPUs support TSC scaling. It allows guests to have a
different TSC frequency from host system using this formula: guest_tsc
= host_tsc * tsc_ratio + vmcb_offset. The tsc_ratio is a 64bit MSR
contains a fixed-point number in 8.32 format (8 bits for integer part
and 32bits for fractional part). For instance 0x00000003_80000000
means tsc_ratio=3.5.

This patch enables TSC scaling ratio for SVM. With it, guest VMs don't
need take #VMEXIT to calculate a translated TSC value when it is
running under TSC emulation mode. This can substancially reduce the
rdtsc overhead.

Signed-off-by: Wei Huang <wei.huang2@amd.com>
14 years agox86/intel: Fix CPUID leaf 7 detection
Yang, Wei [Sat, 28 May 2011 07:57:12 +0000 (08:57 +0100)]
x86/intel: Fix CPUID leaf 7 detection

Must set subleaf to 0 (input ECX==0).

Signed-off-by: Yang, Wei <wei.y.yang@intel.com>
Signed-off-by: Li, Xin <xin.li@intel.com>
14 years agomem_event: Revert pointless, unrelated, and broken (on i386) change in 23434:ef410f262299
Keir Fraser [Sat, 28 May 2011 07:33:54 +0000 (08:33 +0100)]
mem_event: Revert pointless, unrelated, and broken (on i386) change in 23434:ef410f262299

vcpu_pause() is nestable in the hypervisor, hence checking for
already-paused is not required.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agomem_event: Allow memory access listener to perform single step execution.
Aravindh Puthiyaparambil [Fri, 27 May 2011 17:44:26 +0000 (18:44 +0100)]
mem_event: Allow memory access listener to perform single step execution.

Add a new memory event that handles single step. This allows the
memory access listener to handle instructions that modify data within
the execution page.  This can be enabled in the listener by doing:
xc_set_hvm_param(xch, domain_id, HVM_PARAM_MEMORY_EVENT_SINGLE_STEP,
HVMPME_mode_sync)

Now the listener can start single stepping by:
xc_domain_debug_control(xch, domain_id,
XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_ON, vcpu_id)

And stop single stepping by: xc_domain_debug_control(xch, domain_id,
XEN_DOMCTL_DEBUG_OP_SINGLE_STEP_OFF, vcpu_id)

Signed-off-by: Aravindh Puthiyaparambil <aravindh@virtuata.com>
Acked-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoFix the filename of the archive produced by 'make deb'
Tim Deegan [Fri, 27 May 2011 17:41:12 +0000 (18:41 +0100)]
Fix the filename of the archive produced by 'make deb'

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoClean up stdarg handling a little. Fix for NetBSD.
Keir Fraser [Fri, 27 May 2011 14:49:24 +0000 (15:49 +0100)]
Clean up stdarg handling a little. Fix for NetBSD.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoxen/x86: Add -Wnested-externs to CFLAGS
Tim Deegan [Fri, 27 May 2011 07:56:47 +0000 (08:56 +0100)]
xen/x86: Add -Wnested-externs to CFLAGS

This will catch any new extern declarations that happen actually
inside function bodies.  Unfortunately there's no equivalent
warning for extern declarations at rootl level in .c files.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoxen: remove more declarations from C files.
Tim Deegan [Fri, 27 May 2011 07:56:12 +0000 (08:56 +0100)]
xen: remove more declarations from C files.

This patch moves some more, mostly data, extern declarations into
header files.   I haven't been as strict as I was with functions;
in particular there are a number of declarations of assembler labels
that are only used in one place.  I've also left a few compat-mode
tricks, and all the magic in symbols.c

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agolibxl: use preferred syntax for network device creation with upstream qemu
Ian Campbell [Thu, 26 May 2011 16:16:47 +0000 (17:16 +0100)]
libxl: use preferred syntax for network device creation with upstream qemu

Markus Armbruster points out in <m3r582pzc1.fsf@blackfin.pond.sub.org>
on qemu-devel that this is the prefered syntax going forward. Using it avoid
needlessly instantiating a qemu "vlan" and instead creates a simply host end
point and device.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Cc: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: Add 'e820_host' option to config file.
Konrad Rzeszutek Wilk [Thu, 26 May 2011 13:56:37 +0000 (09:56 -0400)]
libxl: Add 'e820_host' option to config file.

.. which will be removed once the auto-ballooning of guests
with PCI devices works. During testing of the patches which provide
a host E820 in a PV guest, certain inconsistencies were found with
guests. When launching a RHEL5 or SLES11 PV guest with 4GB and a PCI device,
the kernel would report 4GB, but have 1.5G "used". What happend was that
the P2M that fall within the E820 I/O holes would never be used and was just
wasted. The mechanism to go around this is to shrink the size of the guest
before launch (say memory=2048, maxmem=4096) and then balloon back to 4096M
after start. For PVOPS type kernels it would detect the E820 I/O holes and
deflate by the correct amount but would not inflate back to 4GB.
Manually inflating makes it work.

The fix in the future for guests where the memory amount flows over the
PCI hole, is to launch the guest with decreased amount right up to the cusp
of where the E820 PCI hole starts. Also increase the 'maxmem' by the delta
and then when the guest has launched, balloon up to the delta number.

This will require some careful surgery so for right now this parameter
will guard against unsuspecting users seeing their PV guests memory "vanish."

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxl: Convert E820_UNUSABLE and E820_RAM to E820_UNUSABLE as appropriate.
Konrad Rzeszutek Wilk [Thu, 26 May 2011 13:56:30 +0000 (09:56 -0400)]
libxl: Convert E820_UNUSABLE and E820_RAM to E820_UNUSABLE as appropriate.

Most machines after the RAM regions in the e802 have a couple of
E820_RESERVED, with E820_ACPI and E820_NVS. On some Intel machines, the
E820 looks like swiss cheese:

(XEN) Initial Xen-e820 RAM map:
(XEN)  0000000000000000 - 000000000009d000 (usable)
(XEN)  000000000009d000 - 00000000000a0000 (reserved)
(XEN)  00000000000e0000 - 0000000000100000 (reserved)
(XEN)  0000000000100000 - 000000009cf66000 (usable)
(XEN)  000000009cf66000 - 000000009d102000 (ACPI NVS)
(XEN)  000000009d102000 - 000000009f6bd000 (usable)  <--
(XEN)  000000009f6bd000 - 000000009f6bf000 (reserved)
(XEN)  000000009f6bf000 - 000000009f714000 (usable)  <--
(XEN)  000000009f714000 - 000000009f7bf000 (ACPI NVS)
(XEN)  000000009f7bf000 - 000000009f7e0000 (usable)  <--
(XEN)  000000009f7e0000 - 000000009f7ff000 (ACPI data)
(XEN)  000000009f7ff000 - 000000009f800000 (usable)  <--
(XEN)  000000009f800000 - 00000000a0000000 (reserved)
(XEN)  00000000a0000000 - 00000000b0000000 (reserved)
(XEN)  00000000fc000000 - 00000000fd000000 (reserved)
(XEN)  00000000ffe00000 - 0000000100000000 (reserved)
(XEN)  0000000100000000 - 0000000160000000 (usable)

Which means we have to pay attention to the E820_RAM that are
between the E820_[ACPI,NVS,RESERVED]. If we remove those
E820_RAM (b/c the amount of memory passed to the guest
is less that where those E820 regions reside) from the E820, the
Linux kernel interprets those "gaps" as PCI I/O space.
This is what we are currently doing.

This can be disastrous if we pass in an Intel IGD card which tries
to use the first available PCI I/O space - and ends up
using the MFNs which are actually RAM instead of being the
PCI I/O space.

To make this work, we convert all E820_RAM that are above
the 'target_kb' (those that overlap the 'target_kb'
are truncated appropriately) to be E820_UNUSABLE. We also limit this
alternation up to 4GB. This means that an E820 for a guest
>from this (target_kb=1024, maxmem=2048):

[    0.000000] Set 405658 page(s) to 1-1 mapping.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
[    0.000000]  Xen: 0000000040000000 - 000000009cf66000 (unusable)
[    0.000000]  Xen: 000000009cf66000 - 000000009d102000 (ACPI NVS)
[    0.000000]  Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
[    0.000000]  Xen: 000000009f714000 - 000000009f7bf000 (ACPI NVS)
[    0.000000]  Xen: 000000009f7e0000 - 000000009f7ff000 (ACPI data)
[    0.000000]  Xen: 000000009f800000 - 00000000b0000000 (reserved)
[    0.000000]  Xen: 00000000fc000000 - 00000000fd000000 (reserved)
[    0.000000]  Xen: 00000000fec00000 - 00000000fec01000 (reserved)
[    0.000000]  Xen: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  Xen: 00000000ffe00000 - 0000000100000000 (reserved)
[    0.000000]  Xen: 0000000100000000 - 0000000140800000 (usable)

Will look as so:

[    0.000000] Set 395880 page(s) to 1-1 mapping.
[    0.000000] BIOS-provided physical RAM map:
[    0.000000]  Xen: 0000000000000000 - 00000000000a0000 (usable)
[    0.000000]  Xen: 00000000000a0000 - 0000000000100000 (reserved)
[    0.000000]  Xen: 0000000000100000 - 0000000040000000 (usable)
[    0.000000]  Xen: 0000000040000000 - 000000009cf66000 (unusable)
[    0.000000]  Xen: 000000009cf66000 - 000000009d102000 (ACPI NVS)
[    0.000000]  Xen: 000000009d102000 - 000000009f6bd000 (unusable)
[    0.000000]  Xen: 000000009f6bd000 - 000000009f6bf000 (reserved)
[    0.000000]  Xen: 000000009f6bf000 - 000000009f714000 (unusable)
[    0.000000]  Xen: 000000009f714000 - 000000009f7bf000 (ACPI NVS)
[    0.000000]  Xen: 000000009f7bf000 - 000000009f7e0000 (unusable)
[    0.000000]  Xen: 000000009f7e0000 - 000000009f7ff000 (ACPI data)
[    0.000000]  Xen: 000000009f7ff000 - 000000009f800000 (unusable)
[    0.000000]  Xen: 000000009f800000 - 00000000b0000000 (reserved)
[    0.000000]  Xen: 00000000fc000000 - 00000000fd000000 (reserved)
[    0.000000]  Xen: 00000000fec00000 - 00000000fec01000 (reserved)
[    0.000000]  Xen: 00000000fee00000 - 00000000fee01000 (reserved)
[    0.000000]  Xen: 00000000ffe00000 - 0000000100000000 (reserved)
[    0.000000]  Xen: 0000000100000000 - 0000000140800000 (usable)

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxl: Add support for passing in the host's E820 for PCI passthrough
Konrad Rzeszutek Wilk [Thu, 26 May 2011 13:56:26 +0000 (09:56 -0400)]
libxl: Add support for passing in the host's E820 for PCI passthrough

The code that populates E820 is unconditionally triggered by the guest
configuration having "pci=['<BDF>,..']", being a PV guest, and if
b_info->u.pv.e820_host is set.

The code do_domain_create calls the libxl__e820_alloc when
it notices that the guest is PV, has at least one PCI devices, and has
the e820_host flag set.

libxl__e820_alloc calls the xc_get_machine_memory_map to retrieve the systems
E820. Then the E820 is sanitized to weed out E820 entries below 16MB, and as
well remove any E820_RAM or E820_UNUSED regions as the guest does not need to
know about them. The guest only needs the E820_ACPI, E820_NVS, E820_RESERVED to
get an idea of where the PCI I/O space is. Mostly.. The Linux kernel assumes that any
gap in the E820 is considered PCI I/O space which means that if we pass
in the guest 2GB, and the E820_ACPI, and its friend start at 3GB, the
gap between 2GB and 3GB will be considered as PCI I/O space. To guard against
that we also create an E820_UNUSABLE between the region of 'target_kb'
(called ram_end in the code) up to the first E820_[ACPI,NVS,RESERVED] region.
Lastly, the xc_domain_set_memory_map is called to install the new E820.

When tested with another PV guest (NetBSD 5.1) the modified E820 gave
it no trouble. The code has also been tested with older "classic" Xen Linux
and with the newer "pvops" with success (SLES11, RHEL5, Ubuntu Lucid,
Debian Squeeze, 2.6.37, 2.6.38, 2.6.39).

Memory that is slack or for balloon (so 'maxmem' in guest configuration)
is put behind the machine E820. Which in most cases is after the 4GB.

The reason for doing the fetching of the E820 using the hypercall in
the toolstack (instead of the guest doing it) is that when a guest
would do a hypercall to 'XENMEM_machine_memory_map' it would
retrieve an E820 with I/O range caps added in. Meaning that the
region after 4GB up to end of possible memory would be marked as unusable
and the kernel would not have any space to allocate a balloon
region.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxl: fix build failure (unused variables) for non-Linux platforms
Christoph Egger [Thu, 26 May 2011 14:55:22 +0000 (15:55 +0100)]
libxl: fix build failure (unused variables) for non-Linux platforms

Move variable definitions into Linux-specific sections where they are
actually used. Fixes warning about unused variables.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools/libfsimage: build fix (ctype macros applied to char)
Christoph Egger [Thu, 26 May 2011 14:43:22 +0000 (15:43 +0100)]
tools/libfsimage: build fix (ctype macros applied to char)

Fix warning: array subscript has type 'char'

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools: Enable superpages for HVM domains by default
George Dunlap [Thu, 26 May 2011 14:27:34 +0000 (15:27 +0100)]
tools: Enable superpages for HVM domains by default

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agotools: Introduce "allocate-only" page type for migration
George Dunlap [Thu, 26 May 2011 14:27:33 +0000 (15:27 +0100)]
tools: Introduce "allocate-only" page type for migration

To detect presence of superpages on the receiver side, we need
to have strings of sequential pfns sent across on the first iteration
through the memory.  However, as we go through the memory, more and
more of it will be marked dirty, making it wasteful to send those pages.

This patch introduces a new PFINFO type, "XALLOC".  Like PFINFO_XTAB, it
indicates that there is no corresponding page present in the subsquent
page buffer.  However, unlike PFINFO_XTAB, it contains a pfn which should be
allocated.

This new type is only used for migration; but it's placed in
xen/public/domctl.h so that the value isn't reused.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agotools: Save superpages in the same batch, to make detection easier
George Dunlap [Thu, 26 May 2011 14:27:32 +0000 (15:27 +0100)]
tools: Save superpages in the same batch, to make detection easier

On the first time through (when pfns are mostly allocated on
the receiving side), try to keep superpages together in the same
batch by ending a batch early if we see the first page of a
potential superpage and there isn't enough room in the batch
for a full superpage.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agotools: libxc: Detect superpages on domain restore
George Dunlap [Thu, 26 May 2011 14:13:52 +0000 (15:13 +0100)]
tools: libxc: Detect superpages on domain restore

When receiving pages, look for contiguous 2-meg aligned regions and
attempt to allocate a superpage for that region, falling back to
4k pages if the allocation fails.

(Minor conflict fixed up. -iwj)

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoMerge
Ian Jackson [Thu, 26 May 2011 14:08:44 +0000 (15:08 +0100)]
Merge

14 years agolibxl: libxl_domain_setmaxmem: actually call xc_domain_setmaxmem
Stefano Stabellini [Thu, 26 May 2011 14:05:47 +0000 (15:05 +0100)]
libxl: libxl_domain_setmaxmem: actually call xc_domain_setmaxmem

Currently libxl_domain_setmaxmem doesn't do anything, but it should call
xc_domain_setmaxmem to enforce the new "xen maximum" target for the
domain (see tools/libxl/libxl_memory.txt).

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools: remus: blktap2/block-remus.c - potential write-after-write race fix
Shriram Rajagopalan [Thu, 26 May 2011 14:04:46 +0000 (15:04 +0100)]
tools: remus: blktap2/block-remus.c - potential write-after-write race fix

At the end of a checkpoint, when a new flush (of buffered disk writes)
is merged with ongoing flush, we have to make sure that none of the new
disk I/O requests overlap with ones in in progress. If it does, hold the
request and dont issue I/O until the overlapping one finishes. If we allow
the I/O to proceed, we might end up with two overlapping requests in the
disk's queue and the disk may not offer any guarantee on which one is
written first.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoRename inj_msi -> inject_msi
Keir Fraser [Thu, 26 May 2011 14:04:29 +0000 (15:04 +0100)]
Rename inj_msi -> inject_msi

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agotools: remus: support DRBD disk backends
Shriram Rajagopalan [Thu, 26 May 2011 14:03:39 +0000 (15:03 +0100)]
tools: remus: support DRBD disk backends

DRBD disk backends can be used instead of tapdisk backends for Remus.
This requires a Remus style disk replication protocol (asynchronous
replication with output buffering at backup), that is not available in
standard DRBD code. A modified version that supports this new replication
protocol is available from git://aramis.nss.cs.ubc.ca/drbd-8.3-remus

Use of DRBD disk backends provides a means for efficient
resynchronization of data after the crashed machine comes back
online. Since DRBD allows for online resynchronization, a DRBD backed
Remus VM does not have to be stopped or shutdown while the disks are
resynchronizing. Once resynchronization is complete, Remus can be
started at will.

Signed-off-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxc: add new operation in HVMOP to deliver emulated MSI.
Wei Liu [Thu, 26 May 2011 14:01:17 +0000 (15:01 +0100)]
libxc: add new operation in HVMOP to deliver emulated MSI.

Signed-off-by: Wei Liu <liuw@liuw.name>
14 years agox86: Add a new operation in HVMOP to inject emulated MSI.
Wei Liu [Thu, 26 May 2011 13:58:28 +0000 (14:58 +0100)]
x86: Add a new operation in HVMOP to inject emulated MSI.

The original vmsi_deliver is renamed to vmsi_deliver_pirq. New
vmsi_deliver is dedicated to the actually delivering.

Original HVMOP number is unchanged. New operation is numbered 16
and enclosed by (__XEN__) and (__XEN_TOOLS__).

Signed-off-by: Wei Liu <liuw@liuw.name>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agolibxc: xc_domain_set_memory_map, xc_get_machine_memory_map (x86, amd64 only)
Konrad Rzeszutek Wilk [Thu, 26 May 2011 13:49:50 +0000 (14:49 +0100)]
libxc: xc_domain_set_memory_map, xc_get_machine_memory_map (x86, amd64 only)

Add these two functions.

The later retrieves the E820 as seen by the hypervisor (completely
unchanged) and the second call sets the E820 for the specified guest.

Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools/hotplug/Linux: start all xen daemons in runlevel 2
Fabio Fantoni [Thu, 26 May 2011 13:38:56 +0000 (14:38 +0100)]
tools/hotplug/Linux: start all xen daemons in runlevel 2

Signed-off-by: Fabio Fantoni <fabio.fantoni@heliman.it>
Acked-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: add missing ":" in log messages
Ian Jackson [Thu, 26 May 2011 13:35:39 +0000 (14:35 +0100)]
libxl: add missing ":" in log messages

libxl__logv would fail to put a ":" between the function name and the
rest of the message.

Reported-by: Zhou Peng <zhoupeng@nfs.iscas.ac.cn>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: support "spice" (remote display protocol) with upstream qemu
Zhou Peng [Thu, 26 May 2011 13:33:52 +0000 (14:33 +0100)]
libxl: support "spice" (remote display protocol) with upstream qemu

This patch allows you to use spice for
xen-upstream-qemu on upstream Xen or released Xen-4.1.0.

Nothing need to be modified in xen-upstream-qemu,
because qemu has include spice's code as a new feature since qemu-0.14.

Usage:

Add spice fields in VM cfg file.  e.g.
    spice=1
    spiceport=6000
    spicehost='192.168.1.187'
    spicedisable_ticketing = 0 # default is 0
    spicepasswd = 'password'
    spiceagent_mouse = 1 # default is 1

Signed-off-by: Zhou Peng <zhoupeng@nfs.iscas.ac.cn>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoxen: remove extern function declarations from C files.
Tim Deegan [Thu, 26 May 2011 11:37:47 +0000 (12:37 +0100)]
xen: remove extern function declarations from C files.

Move all extern declarations into appropriate header files.
This also fixes up a few places where the caller and the definition
had different signatures.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoxentrace: allocate non-contiguous per-cpu trace buffers
Olaf Hering [Thu, 26 May 2011 11:36:27 +0000 (12:36 +0100)]
xentrace: allocate non-contiguous per-cpu trace buffers

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agoxentrace: update __insert_record() to copy the trace record to individual mfns
Olaf Hering [Thu, 26 May 2011 11:36:03 +0000 (12:36 +0100)]
xentrace: update __insert_record() to copy the trace record to individual mfns

Update __insert_record() to copy the trace record to individual mfns.
This is a prereq before changing the per-cpu allocation from
contiguous to non-contiguous allocation.

v2:
  update offset calculation to use shift and mask
  update type of mfn_offset to match type of data source

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agoxentrace: fix type of offset to avoid ouf-of-bounds access
Olaf Hering [Thu, 26 May 2011 11:35:30 +0000 (12:35 +0100)]
xentrace: fix type of offset to avoid ouf-of-bounds access

Update the type of the local offset variable to match the type where
this variable is stored. Also update the type of t_info_first_offset
because it has also a limited range.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agoxentrace: reduce trace buffer size to something mfn_offset can reach
Olaf Hering [Thu, 26 May 2011 11:34:44 +0000 (12:34 +0100)]
xentrace: reduce trace buffer size to something mfn_offset can reach

The start of the array which holds the list of mfns for each cpus
tracebuffer is stored in an unsigned short. This limits the total
amount of pages for each cpu as the number of active cpus increases.

Update the math in calculate_tbuf_size() to apply also this rule to
the max number of trace pages. Without this change the index can
overflow.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agox86: fix macro names typo in VMSI code (GLFAGS->GFLAGS)
Wei Liu [Thu, 26 May 2011 07:21:00 +0000 (08:21 +0100)]
x86: fix macro names typo in VMSI code (GLFAGS->GFLAGS)

Signed-off-by: Wei Liu <liuw@liuw.name>
14 years agoIOMMU: Fail if intremap is not available and iommu=required/force.
Ian Campbell [Thu, 26 May 2011 07:18:44 +0000 (08:18 +0100)]
IOMMU: Fail if intremap is not available and iommu=required/force.

Rather than sprinkling panic()s throughout the setup code hoist the
check up into common code.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Signed-off-by: Keir Fraser <keir@xen.org>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools/hotplug: support vif-post.d hook arrangements
W. Michael Petullo [Wed, 25 May 2011 09:45:24 +0000 (10:45 +0100)]
tools/hotplug: support vif-post.d hook arrangements

New feature: you can drop hook scripts into
 /etc/xen/scripts/vif-post.d/*.hook

Acked-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: fix initialization of the disk parsing iterator
Stefano Stabellini [Tue, 24 May 2011 17:33:06 +0000 (18:33 +0100)]
libxl: fix initialization of the disk parsing iterator

Fix the initialization of the disk parsing iterator.

Signed-off-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: refactor libxl__domain_firmware to choose based on
Ian Campbell [Tue, 24 May 2011 17:27:50 +0000 (18:27 +0100)]
libxl: refactor libxl__domain_firmware to choose based on
device_model_version

Note that the default remains "hvmloader" in both cases, this just
clarifies the intent for now.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: fixup error handling in libxl__build_hvm.
Ian Campbell [Tue, 24 May 2011 17:26:47 +0000 (18:26 +0100)]
libxl: fixup error handling in libxl__build_hvm.

We first pointless initialise rc and immediately overwrite the value, then fail
to return it on error anyway...

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: pass device model info down into HVM domain build functions.
Ian Campbell [Tue, 24 May 2011 17:25:56 +0000 (18:25 +0100)]
libxl: pass device model info down into HVM domain build functions.

The builder will soon need to know the device model version in order to select
the correct firmware.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl/xl: Use "firmware" rather than "hvmloader" in API.
Ian Campbell [Tue, 24 May 2011 17:24:58 +0000 (18:24 +0100)]
libxl/xl: Use "firmware" rather than "hvmloader" in API.

23251:0710f53cef4a turned build_info.kernel into
build_info.hvm.hvmloader however this is a rather specific name for a
field which may be used to load things which aren't hvmloader in the
future. Switch to calling the field and associated configuration itmes
"firmware" instead.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools: libxc: allow HVM firmware to be loaded at an arbitrary alignment
Ian Campbell [Tue, 24 May 2011 17:24:05 +0000 (18:24 +0100)]
tools: libxc: allow HVM firmware to be loaded at an arbitrary alignment

Enables direct loading of e.g. seabios.elf.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools: libxl: constify parameter to libxl__abs_path
Ian Campbell [Tue, 24 May 2011 17:22:23 +0000 (18:22 +0100)]
tools: libxl: constify parameter to libxl__abs_path

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools: libxc: enable libelf logging for HVM domain builder.
Ian Campbell [Tue, 24 May 2011 17:21:45 +0000 (18:21 +0100)]
tools: libxc: enable libelf logging for HVM domain builder.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>:
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agopygrub: XFS support for pygrub
Marco Nenciarini [Tue, 24 May 2011 17:11:24 +0000 (18:11 +0100)]
pygrub: XFS support for pygrub

Signed-off-by: Marco Nenciarini <marco.nenciarini@devise.it>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agotools: ocaml: remove non-posix-ism from sed script.
Ian Campbell [Tue, 24 May 2011 16:52:02 +0000 (17:52 +0100)]
tools: ocaml: remove non-posix-ism from sed script.

Christoph Egger reported than on NetBSD the build fails with

Parsing tools/ocaml/libs/xl/../../../../tools/libxl/libxl.idl
sed: 1: "1i(*\
  * AUTO-GENERATED ...": command i expects \ followed by test
gmake[7]: Leaving directory `tools/ocaml/libs/xl'

The following was tested by Christoph on NetBSD and also with GNU-sed
with and without the --posix flag.

In addition when sed fails will still create the output file, which confuses
subsequent make invocations. Generate to a temporary file and move into place
only on success.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Christoph Egger <Christoph.Egger@amd.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxc: save: move static stats variable to stack variable.
Ian Campbell [Tue, 24 May 2011 09:14:10 +0000 (10:14 +0100)]
libxc: save: move static stats variable to stack variable.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxc: save: don't bother calculating stat's deltas unless we are going to print...
Ian Campbell [Tue, 24 May 2011 09:14:10 +0000 (10:14 +0100)]
libxc: save: don't bother calculating stat's deltas unless we are going to print them

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxc: save: encapsulate time stats in a struct
Ian Campbell [Tue, 24 May 2011 09:14:10 +0000 (10:14 +0100)]
libxc: save: encapsulate time stats in a struct

As a precursor to making non-static.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxc: save: move static "write_count" variable into outbuf.
Ian Campbell [Tue, 24 May 2011 09:14:10 +0000 (10:14 +0100)]
libxc: save: move static "write_count" variable into outbuf.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxc: save: noncached write doesn't use live parameter.
Ian Campbell [Tue, 24 May 2011 09:14:10 +0000 (10:14 +0100)]
libxc: save: noncached write doesn't use live parameter.

so drop it.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxc: save: rename ratewrite to uncached.
Ian Campbell [Tue, 24 May 2011 09:14:10 +0000 (10:14 +0100)]
libxc: save: rename ratewrite to uncached.

It doesn't do any ratelimiting...

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxc: save: drop code under ADAPTIVE_SAVE ifdef.
Ian Campbell [Tue, 24 May 2011 09:14:10 +0000 (10:14 +0100)]
libxc: save: drop code under ADAPTIVE_SAVE ifdef.

The ifdef was added in 2005 (7702:b3c2bc39d815) but, as far as I can see, was
never enabled by default. Dropping it will help untangle some macros redefining
functions etc.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
14 years agolibxc: save/restore: remove static context variables
Ian Campbell [Tue, 24 May 2011 09:14:10 +0000 (10:14 +0100)]
libxc: save/restore: remove static context variables

20544:ad9d75d74bd5 and 20545:cc7d66ba0dad seemingly intended to change these
global static variables into stack variables but didn't remove the static
qualifier.

Also zero the entire struct once with memset rather than clearing fields
piecemeal in two different places.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson.citrix.com>
Committed-by: Ian Jackson <ian.jackson.citrix.com>
Acked-by: Vincent Hanquez <vincent.hanquez@eu.citrix.com>
14 years agolibxl: libxl__xs_write format string should be const.
Ian Campbell [Tue, 24 May 2011 16:12:27 +0000 (17:12 +0100)]
libxl: libxl__xs_write format string should be const.

George Dunlap reports that gcc 4.4.3 complains:
  libxl_dm.c: In function libxl__create_device_mode:
  libxl_dm.c:776: error: format not a string literal and no format arguments
And indeed the format argument here is a char * from libxl__domain_bios().

Make the argument to libxl__xs_write a const char * and change
libxl__domain_bios to return a const char too.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: don't close file descriptors 4..255 in libxl__exec
Ian Campbell [Tue, 24 May 2011 15:18:19 +0000 (16:18 +0100)]
libxl: don't close file descriptors 4..255 in libxl__exec

It prevents callers from deliberately passing file descriptors to the child,
hides any callers who erroneously do so and doesn't deal with all file
descriptors in any case.

Rather than remove it all together replace it with some debug code
which checks for open file handles which do not have either O_CLOEXEC
or FD_CLOEXEC set. To enable the debug set _LIBXL_DEBUG_EXEC_FDS=1 to
print any open and non-CLOEXEC non-stdio FDs just before libxl__exec
actualy calls exec. Set _LIBXL_DEBUG_EXEC_FDS=2 to abort if any of
these exist.

On the basis of this debugging fix some leaked filehandles:
  * The read end of the pipe used to wake the parent from the
    intermediate process during libxl__spawn_spawn was leaked into the
    intermediate process.
  * The file descriptor representing the xl lock was not marked
    O_CLOEXEC (the lock itself is already specified to not be
    inherited across a fork).
  * The file descriptors passed to libxl__exec to be dup'd as
    std{in,out,err} were leaked at their original number. They are
    harmless (an attacker can just as easily use fd 0..2) but close
    anyway since it removes a case which a person evaluating open fd's
    needs to consider.
  * libxl_run_bootloader was leaking the xenconsole pty master into
    the bootloader child process.
  * If the bootloader fails to get as far as opening its end of the
    FIFO then we can also hang, check that the process has not exited
    as part of that loop. (we actually block opening the FIFO too so
    this is only a partial fix for the case where the bootlader has
    crashed quickly).

With these fixes I have tested that device models, bootloaders
(pygrub) and vncviewers which are spawned via libxl__exec with no
unexpected file descriptors open, at least in my configuration.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: improve logging on failure to start device model.
Ian Campbell [Tue, 24 May 2011 15:15:53 +0000 (16:15 +0100)]
libxl: improve logging on failure to start device model.

Distinguish between device model dying during startup (libxl__spawn_check
returns failure) and timing out while waiting for the xenstore node to show up.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoremus: write circumlocution for try..except..finally
Jinsong Liu [Tue, 24 May 2011 15:12:25 +0000 (16:12 +0100)]
remus: write circumlocution for try..except..finally

Parsing /otc/source/vtd/xen-unstable/tools/python/../../tools/libxl/libxl.idl
  File "/usr/lib64/python2.4/site-packages/xen/remus/save.py", line 169
    finally:
          ^
SyntaxError: invalid syntax

This was introduced in 23195:13ec53a59a42
It is a problem for Python 2.4 and earlier, only.

So use try...(try...except)...finally as suggested by Ian Campbell.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
Acked-by: Shriram Rajagopalan <rshriram@cs.ubc.ca>
Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: add statup checks to libxl__wait_for_device_model
Ian Campbell [Tue, 24 May 2011 14:57:24 +0000 (15:57 +0100)]
libxl: add statup checks to libxl__wait_for_device_model

When the device model is starting up push checks for spawn failure down into
libxl__wait_for_device_model, allowing us to fail more quickly when the device
model fails to start (e.g. due to a missing library or an early setup error
etc).

In order to allow the select loop in libxl__wait_for_device_model to wake when
the child dies add pipe between the parent and the intermediate process which
the intermediate process can use to signal the parent.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: pass libxl__spawn_starting to libxl__spawn_spawn.
Ian Campbell [Tue, 24 May 2011 14:21:26 +0000 (15:21 +0100)]
libxl: pass libxl__spawn_starting to libxl__spawn_spawn.

Passing a libxl__device_model_starting to a generic function and expecting it
to scrobble inside for the generic data structure is a strange interface.
Instead pass in a libxl__spawn_starting and an opaque hook data pointer.

The for_spawn member of libxl__device_model_starting was annotated with
"first!", suggesting that someone intended to use pointer casting tricks to
move between the outer and inner struct. However the field is a pointer not a
inline struct so this doesn't work (and it isn't used this way anyhow). Remove
the comment, and move the field away from the front for good measure.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: remove redundant call to libxl_domain_device_model
Ian Campbell [Tue, 24 May 2011 14:17:07 +0000 (15:17 +0100)]
libxl: remove redundant call to libxl_domain_device_model

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: check that device model binary is executable.
Ian Campbell [Tue, 24 May 2011 14:15:27 +0000 (15:15 +0100)]
libxl: check that device model binary is executable.

This causes us to fail more quickly in more obvious failure case of not
having the right binary installed.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxc: obtain correct length of p2m during core dumping
Markus Gross [Tue, 24 May 2011 14:00:16 +0000 (15:00 +0100)]
libxc: obtain correct length of p2m during core dumping

while implementing core dumping functionality for the libxl driver
of libvirt, I discovered an issue with mapping pages of a pv guest.

After dumping the core of a pv guest the domain was not cleared up
properly and some pages were not unmapped. This issue is similar
to the one reported here:
http://lists.xensource.com/archives/html/xen-devel/2011-05/msg01314.html

In xc_domain_dumpcore_via_callback in the file xc_core.c the function
xc_core_arch_map_p2m is called to map P2M_FL_ENTRIES pages to the variable p2m.
But to unmap the pages later, the dinfo->p2m_size has to be set accordingly.
This was not done, instead a variable named p2m_size was set.
This way P2M_FL_ENTRIES was always zero and the pages were left mapped.

[ This change should be considered for backport to relevant trees. ]

Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxc: after saving, unmap correct amount for live_m2p
Jim Fehlig [Tue, 24 May 2011 13:50:00 +0000 (14:50 +0100)]
libxc: after saving, unmap correct amount for live_m2p

With some help from Olaf, I've finally got to the bottom of an issue I
came across while trying to implement save/restore in the libvirt
libxenlight driver.  After issuing the save operation, the saved domain
was not being cleaned up properly and left in this state from xl's
perspective

xen33:# xl list
Name                   ID   Mem VCPUs      State   Time(s)
Domain-0                0  6821     8     r-----     122.5
(null)                  2     2     2     --pssd      10.8

Checking the libvirtd /proc/$pid/maps I found this

7f3798984000-7f3798b86000 r--s 00002000 00:03 4026532097 /proc/xen/privcmd

So not all all pages belonging to the domain were unmapped from
libvirtd.  In tools/libxc/xc_domain_save.c we found that P2M_FL_ENTRIES
were being mapped but only P2M_FLL_ENTRIES were being unmapped.  The
attached patch changes the unmapping to use the same P2M_FL_ENTRIES
macro.  I'm not too familiar with this code though so posting here for
review.

I suspect this was not noticed before since most (all?) processes doing
save terminate after the save and are not long-running like libvirtd.

Ian Campbell writes:
> Looks like I introduced this in 18558:ccf0205255e1, sorry!
>
> I guess it is also wrong in the error path out of map_and_save_p2m_table
> and so we also need [another hunk].

This change should be backported to relevant earlier trees. -iwj

From: Jim Fehlig <jfehlig@novell.com>
From: Ian Campbell <Ian.Campbell@citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Cc: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Campbell <Ian.Campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agox86/mm: Fix one more check in the EPT p2m code
Tim Deegan [Tue, 24 May 2011 08:30:51 +0000 (09:30 +0100)]
x86/mm: Fix one more check in the EPT p2m code

This is one more place that needs to check for 0 entries
after the AMD p2m-sharing patch made p2m_ram_rw == 0

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agodrivers/passthrough: fix error paths in pci_add_device*()
Tim Deegan [Mon, 23 May 2011 17:35:32 +0000 (18:35 +0100)]
drivers/passthrough: fix error paths in pci_add_device*()

When a device can't be allocated to dom0 by the IOMMU, don't leave
dom0 in the "domain" field.  It causes pci_remove_device()
to crash trying to remove the dev from the domain's list of devices
(and was probably the wrong thing to do anyway).

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agodrivers/passthrough: Revert 23352:ea48976517af -- incorrect bugfix.
Keir Fraser [Mon, 23 May 2011 17:35:04 +0000 (18:35 +0100)]
drivers/passthrough: Revert 23352:ea48976517af -- incorrect bugfix.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoFix Config.mk's cc-option for -Wno-* options.
Keir Fraser [Mon, 23 May 2011 16:38:28 +0000 (17:38 +0100)]
Fix Config.mk's cc-option for -Wno-* options.

These disable-warning options are handled specially by GCC:
 (a) they are ignored unless the compiler emits a warning; and
 (b) even then they produce a warning rather than an error

To handle this, modify the test invocation of GCC to compile a
fragment of code that will always provoke a warning (integer assigned
to pointer). This works around (a) above.

Then, we grep the compiler's stdout/stderr for the option-under-test,
the presence of which would indicate an "unrecognized command-line
option" warning/error. This works around (b) above, letting us
distinguish between the "integer assigned to pointer" and
"unrecognized command-line option" warnings.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agogcc-4.6 compile fix: build with -Wno-unused-but-set-variable
Olaf Hering [Sat, 21 May 2011 06:55:46 +0000 (07:55 +0100)]
gcc-4.6 compile fix: build with -Wno-unused-but-set-variable

Avoid "error: variable 'unused' set but not used
[-Werror=unused-but-set-variable]" with gcc 4.6.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agotools: Fix build failure with gcc 4.4.3-4ubuntu5
George Dunlap [Fri, 20 May 2011 17:20:09 +0000 (18:20 +0100)]
tools: Fix build failure with gcc 4.4.3-4ubuntu5

c/s 23253:a3db6b91f32d causes build failure with gcc 4.4.3-4ubuntu5,
as the compiler can't figure out that the value returned is always
a string literal.

Signed-off-by: George Dunlap <george.dunlap@eu.citrix.com>
14 years agolibxl: turn some stray printf's in libxl into LIBXL__LOG.
Ian Campbell [Fri, 20 May 2011 17:12:41 +0000 (18:12 +0100)]
libxl: turn some stray printf's in libxl into LIBXL__LOG.

Appear to have been leftover from when the domain create stuff was
pushed down from xl into the library.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agohotplug: fix busy loop device detection
Olaf Hering [Fri, 20 May 2011 17:09:26 +0000 (18:09 +0100)]
hotplug: fix busy loop device detection

Improve busy loop device detection after changeset 22773:02c0af2bf280

The intention is not to find the file to be mounted in the losetup -a
output.  What matters are existing mounted files with the same dev:inode
as the new file.  So the fix is to apply variable expansion which
happens only without double quotes.  Otherwise $dev will contain
newlines for hardlinked files, as mentioned in the commit message from
the changeset above.

losetup -a does also truncate long filenames to 62 chars due to ioctl
limitations.  This part is fixed with 2.6.37 where the filename can be
obtained from sysfs. As a result very long filenames will be missed.

Signed-off-by: Olaf Hering <olaf@aepfle.de>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: make it possible to disable vnc
Zhou Peng [Fri, 20 May 2011 15:11:13 +0000 (16:11 +0100)]
libxl: make it possible to disable vnc

tools/libxl/libxl__build_device_model_args_new/old: The condition is
so rigorous that user has no chance to disable the vnc,
considering what has been done in parse_config_data() by default,
which is not resonable with vnc option in vm-cfg file.

I think, If user explicitly set "vnc=0", vnc should be disabled.
User should have the chance to only use sdl, other remote
display(spice) and even nothing.

Signed-off-by: Zhou Peng <zhoupeng@nfs.iscas.ac.cn>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agoxen: Include headers that are actually needed, drop everything else.
Christoph Egger [Fri, 20 May 2011 14:39:07 +0000 (15:39 +0100)]
xen: Include headers that are actually needed, drop everything else.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
14 years agox86/mca: Fix debug output.
Liu, Jinsong [Fri, 20 May 2011 12:42:23 +0000 (13:42 +0100)]
x86/mca: Fix debug output.

At x86_mcinfo_dump(), a little bug at printk information,
illusively= indicate an CMCI/POLLED error to a MCE error, this will
make debug confusing.

Signed-off-by: Liu, Jinsong <jinsong.liu@intel.com>
14 years agoRemove unused var from acpi_idle_do_entry().
Keir Fraser [Fri, 20 May 2011 08:44:41 +0000 (09:44 +0100)]
Remove unused var from acpi_idle_do_entry().

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agoxen: Remove some initialised but otherwise unused variables.
Olaf Hering [Fri, 20 May 2011 08:33:53 +0000 (09:33 +0100)]
xen: Remove some initialised but otherwise unused variables.

Fixes the build under gcc-4.6 -Werror=unused-but-set-variable

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agogcc-4.6 compile fix: tools/libxc/xc_domain_restore.c
Olaf Hering [Fri, 20 May 2011 08:18:17 +0000 (09:18 +0100)]
gcc-4.6 compile fix: tools/libxc/xc_domain_restore.c

xc_domain_restore.c: In function 'xc_domain_restore':
xc_domain_restore.c:1090:18: error: variable 'prev_pc' set but not
used [-Werror=unused-but-set-variable]

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agogcc-4.6 compile fix: tools/libxc/xc_tmem.c
Olaf Hering [Fri, 20 May 2011 08:17:46 +0000 (09:17 +0100)]
gcc-4.6 compile fix: tools/libxc/xc_tmem.c

xc_tmem.c: In function 'xc_tmem_restore':
xc_tmem.c:393:14: error: variable 'save_max_pools' set but not used
[-Werror=unused-but-set-variable]

Signed-off-by: Olaf Hering <olaf@aepfle.de>
14 years agohvmloader: always include HPET table
Paolo Bonzini [Fri, 20 May 2011 08:15:40 +0000 (09:15 +0100)]
hvmloader: always include HPET table

Windows SVVP tests require an HPET table even if the HPET is disabled.
This makes sense since the HPET _is_ in the DSDT and, while the OS
does not know that, in principle it's status may change.

(For what it's worth SeaBIOS, in addition to doing this, totally
ignores QEMU's -no-hpet flag and always reports 0x0f for the HPET's
_STA method).

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86/AMD: don't set ARAT feature flag on family F CPUs
Jan Beulich [Fri, 20 May 2011 08:11:54 +0000 (09:11 +0100)]
x86/AMD: don't set ARAT feature flag on family F CPUs

Following Linux commit 14fb57dccb6e1defe9f89a66f548fcb24c374c1d from
Borislav Petkov.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agonestedsvm: reduce TLB flushes
Keir Fraser [Fri, 20 May 2011 08:07:54 +0000 (09:07 +0100)]
nestedsvm: reduce TLB flushes

Reduce TLB flushes:
1. When we update the cr3 during VMRUN/VMEXIT emulation
    we toggle between n1asid and n2asid forth and back
    => no TLB flush needed
2. Only flush n1asid or n2asid depending on vcpu guest mode
    and not both unconditionally.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agonestedsvm: reduce TLB flushes
Keir Fraser [Fri, 20 May 2011 08:06:58 +0000 (09:06 +0100)]
nestedsvm: reduce TLB flushes

Reduce TLB flushes:
1. When we update the cr3 during VMRUN/VMEXIT emulation
    we toggle between n1asid and n2asid forth and back
    => no TLB flush needed
2. Only flush n1asid or n2asid depending on vcpu guest mode
    and not both unconditionally.

Signed-off-by: Christoph Egger <Christoph.Egger@amd.com>
Signed-off-by: Keir Fraser <keir@xen.org>
14 years agox86: clear CPUID output of leaf 0xd for Dom0 when xsave is disabled
Jan Beulich [Fri, 20 May 2011 07:54:45 +0000 (08:54 +0100)]
x86: clear CPUID output of leaf 0xd for Dom0 when xsave is disabled

Linux starting with 2.6.36 uses the XSAVEOPT instruction and has
certain code paths that look only at the feature bit reported through
CPUID leaf 0xd sub-leaf 1 (i.e. without qualifying the check with one
evaluating leaf 4 output). Consequently the hypervisor ought to mimic
actual hardware in clearing leaf 0xd output when not supporting xsave.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agopci_remove_device: fix linked list discipline
Tim Deegan [Fri, 20 May 2011 07:52:22 +0000 (08:52 +0100)]
pci_remove_device: fix linked list discipline

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoMakefile: install-tools does not depend on ioemu-dir if CONFIG_IOEMU=n
Keir Fraser [Fri, 20 May 2011 07:48:33 +0000 (08:48 +0100)]
Makefile: install-tools does not depend on ioemu-dir if CONFIG_IOEMU=n

Based on patch by George Dunlap.

Signed-off-by: Keir Fraser <keir@xen.org>
14 years agotools: xl: add option to run in foreground but still monitor for reboot etc
Ian Campbell [Tue, 17 May 2011 16:32:19 +0000 (17:32 +0100)]
tools: xl: add option to run in foreground but still monitor for reboot etc

Split daemonization option out from monitoring a domain for reboot
etc. The 'e' option continues to disable both and a new 'F'(oreground)
option disables only daemonization.

When I'm debugging xl in the foreground this is often the behaviour I
would like.

Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Acked-by: Ian Jackson <ian.jackson@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agolibxl: Fix apic/acpi confusion
Ian Jackson [Tue, 17 May 2011 16:28:12 +0000 (17:28 +0100)]
libxl: Fix apic/acpi confusion

"apic" was written a couple of times where "acpi" was meant.

Signed-off-by: Zhou Peng <zhoupeng@nfs.iscas.ac.cn>
Signed-off-by: Ian Jackson <ian.jackson@eu.citrix.com>
Acked-by: Ian Campbell <Ian.Campbell@eu.citrix.com>
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Committed-by: Ian Jackson <ian.jackson@eu.citrix.com>
14 years agox86: further adjustments to arch_set_info_guest() after c/s 23142:f5e8d152a565
Jan Beulich [Tue, 17 May 2011 12:55:45 +0000 (13:55 +0100)]
x86: further adjustments to arch_set_info_guest() after c/s 23142:f5e8d152a565

The adjustments to v->arch.user_regs.eflags and the initialization of
the int80 direct trap must be done earlier (namely before the function
may bail because of inconsistencies between input and stored state on
an already initialised vCPU) so that stored state is consistent, and
for arch_get_info_guest() to not have its eflags related BUG_ON()
triggered.

Further, v->arch.pv_vcpu.ctrlreg[] indices 3 and 1 aren't being kept
up to date while the domain is running, so consistency checks must
instead be done against v->arch.guest_table{,_user}.

Additionally, for 64-bit pv domains, CR1 must also be checked to be
consistent with the kernel mode setting for the vCPU, and the whole
CR1 checking should not be done for 32-bit pv domains.

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agoBuild target to wrap dist/install in a .deb archive.
Tim Deegan [Mon, 16 May 2011 12:34:25 +0000 (13:34 +0100)]
Build target to wrap dist/install in a .deb archive.

Adds "make deb", which does a "make dist" build and wraps the
resulting dist/install files in dist/xen-<version>.deb

This is _not_ a "packaged" version of Xen for Debian users, nor is it
intended to compete with anyone else's packaging efforts.  In
particular it doesn't do any of the boot-time or fstab fixups needed
to actually start the xen tools.  It's just a quick hack for
developers to be able to quickly install and uninstall a Xen build on
a test box.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
14 years agoamd-iommu: two more __init annotations
Jan Beulich [Mon, 16 May 2011 12:33:10 +0000 (13:33 +0100)]
amd-iommu: two more __init annotations

Signed-off-by: Jan Beulich <jbeulich@novell.com>
14 years agox86-64: remove left over uses of .got entries
Jan Beulich [Mon, 16 May 2011 12:32:37 +0000 (13:32 +0100)]
x86-64: remove left over uses of .got entries

These were caused by some declarations happening before the compiler
would have seen the visibility pragma.

Signed-off-by: Jan Beulich <jbeulich@novell.com>